Toward a Statistical Foundation for Data Mining
نویسنده
چکیده
KDD is an inherently statistical activity, and there has been considerable literature which draws upon statistical science. However, the usage has typically been vague and informal at best, and at worst of a seriously misleading nature. The present paper seeks to take a first step in remedying this problem by pairing precise mathematical descriptions of the concepts in KDD with practical interpretations and implications for specific KDD issues.
منابع مشابه
A New Analytical Solution for Determination of Acceptable Overall settlement of Heap Leaching Structures Foundation
There are some artificial and natural materials on foundation of heap leaching structures. Geomembrane liner is the most important artificial isolated layer of these structures. The thickness of this layer is about 1 to 2 mm. Foundation overall settlement of such structures changes the primary length of the geomembrane layer. If the strain of geomembrane is more than allowable one, the layer wi...
متن کاملCollective Data Mining: A New Perspective Toward Distributed Data Mining
This paper introduces the collective data mining (CDM), a new approach toward distributed data mining (DDM) from heterogeneous sites. It points out that naive approaches to distributed data analysis in a heterogeneous environment may face ambiguous situation and may lead to incorrect global data model. It also observes that any function can be expressed in a distributed fashion using a set of a...
متن کاملA Measurement-Theoretic Foundation of Rule Interestingness Evaluation
Many measures have been proposed and studied extensively in data mining for evaluating the usefulness or interestingness of discovered rules. They are normally defined based on structural characteristics or statistical information about the rules. The meaningfulness of each measure is interpreted based on some intuitive argument or mathematical properties. There does not exist a framework in wh...
متن کاملTown trip forecasting based on data mining techniques
In this paper, a data mining approach is proposed for duration prediction of the town trips (travel time) in New York City. In this regard, at first, two novel approaches, including a mathematical and a statistical approach, are proposed for grouping categorical variables with a huge number of levels. The proposed approaches work based on the cost matrix generated by repetitive post-hoc tests f...
متن کاملPaper number Scalable Techniques for Mining Causal Structures
Mining for association rules in market basket data has proved a fruitful area of research Mea sures such as conditional probability con dence and correlation have been used to infer rules of the form the existence of item A implies the existence of item B However such rules indicate only a statistical relationship between A and B They do not specify the nature of the relationship whether the pr...
متن کامل